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Multiple  stochastic  integral  expansions  are  applied  to  the  problem  of 
filtering  a  signal  observed  in  additive  noise*  It  is  shown  that  the  optimal 
mean- square  estimate  may  be  represented  as  a  ratio  of  two  multiple  integral 
series*  A  formula  for  expanding  the  product  of  two  multiple  integrals  is 
developed  and  applied  to  deriving  equations  for  the  kernels  of  best,  finite 
expansion  approximations  to  the  optimal  filter.  These  equations  are  studied 
in  detail  in  the  quadratic  case. 
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SIGNIFICANCE  AND  EXPLANATION 


A  common  problem  in  the  analysis  of  stochastic  systems  is  the  estimation 
of  a  stochastic  process  given  only  noise-corrupted  or  incomplete  observations. 
Examples  occur  in  communications  theory  when  one  wants  to  estimate  a  signal 
sent  over  a  noisy  channel  or  in  time  series  problems.  If  x(t)  is  a 
stochastic  process  denoting  the  signal,  the  observations  are  typically 
modelled  by 

y(t)  =  /J  h(x(s))ds  +  dW(t)  , 

where  W(t)  is  an  independent  increments  "noise"  process,  usually  Brownian 
motion.  The  problem  of  filtering  is  to  build  an  estimate,  i.e.,  filter,  of 
x(t)  using  the  observations  y(s),  s  <  t.  Theoretical  characterizations  of 
best  mean-square  estimates  are  known,  but  can  be  translated  into  effective 
solutions  only  in  special  instances.  In  this  paper,  the  general  filtering 
problem  is  treated  by  attempting  to  expand  filters  in  series  of  multiple 
stochastic  integrals  of  the  form 

Jj  C—C  a(t's1 . sr)dy(sr)...dy(s1)  . 

Two  primary  issues  raised  by  this  idea  are  considered;  representation  of  the 
optimal  mean-square  estimate  by  multiple  integral  expansions,  and  construction 
of  suboptimal  estimates  using  a  finite  number  of  multiple  integrals.  It  is 
shown  that  expansion  of  the  optimal  filter  is  indeed  possible,  and  a  method  is 
presented  for  finding  best,  finite  expansion  estimates.  A  rudimentary  algebra 
of  multiple  integral  expansions  is  first  developed  as  a  tool  to  prove  these 
results. 

The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC,  and  not  with  the  author  of  this  report. 


MULTIPLE  INTEGRAL  EXPANSIONS  FOR  NONLINEAR  FILTERING* 

Daniel  Ocone 

t. 1  Introduction 

In  the  additive  noise  model  of  filtering,  information  about  a  stochastic  process 
x(t),  t  >  0,  called  the  signal,  is  received  through  observations  of  the  form 

y(t)  =*  /jj  h(x(s))ds  +  w(t)  t  >  0 

w(t)  is  a  noise  term  that  corrupts  the  signal,  and  it  is  usually  assumed  to  be  a  Brownian 
motion.  The  filtering  problem  is  to  estimate  from  the  observations  y(s),  0  <  s  <  t,  a 
given  moment  f(x(t>)  of  the  signal  at  time  t,  and,  if  estimators  minimizing  mean- 
square-error  are  desired,  this  means  calculating  the  conditional  mean  F{f(x(t)>  IF  *}, 

•  »  c{ y(s)  |  0  <  s  <  t}.  E{f(x(t))  |  F^}  is  henceforth  referred  to  as  the  optimal 

filter.  Two  fundamental  characterizations  of  the  optimal  filter  are  available:  a)  a  Bayes 
formula  for  E(f(x(t))  I  F*}  as  the  ratio  of  two  conditional,  functional  integrals 
(Kallianpur,  Striebel  [9],  cf.  $1.2  of  this  paper):  b) ,  in  the  case  that  x(t)  is  Markov, 
a  representation  of  the  optimal  filter  as  a  stochastic  integral  against  the  Innovations 
process,  v(t)  “  y(  t)  -  /j j  E{h(x(s))  I  F^}ds,  the  stochastic  integrand  being  adapted  to 
the  observation  process  (Fujisaki,  Kallianpur,  and  Kunita  [2]).  However,  though 
theoretically  deep,  these  results  lead  to  explicit  and  analytically  computable  solutions 
only  in  special  Instances. 


• 
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This  paper  studies  the  application  of  multiple  stochastic  integral  expansions  to  the 


filtering  problem.  Any  filter,  optimal  or  suboptimal  is  actually  an  anticipating 
functional  of  the  observation  process ,  thus  suggesting  that  filters  be  represented  and 
analyzed  within  a  framework  for  functional  expansions.  Multiple  stochastic  integrals  prove 
useful  for  this  purpose.  In  fact,  their  definition  originates  in  Wiener's  homogeneous 
chaos  theory,  which  constructs  orthogonal  decompositions  of  spaces  of  finite-variance 
functionals  of  Gaussian  processes  (cf.  Kallianpur  [8]  and  Kida  [5]).  In  the  Brownian 
motion  case,  each  subspace  of  the  decomposition  corresponds  to  the  space  of  multiple 
stochastic  integrals  of  a  given  order,  and,  thus,  Wiener's  theory  shows  that  any  I,  - 
functional  of  the  Brownian  motion  may  be  expanded  in  a  series  of  multiple  integrals. 
Multiple  integrals  have  been  used  already  to  solve  a  number  of  specific  estimation 
problems.  Marcus,  Mitter,  and  Ocone  (13]  apply  the  homogeneous  chaos  theory  to  compute 
conditional  statistics  of  polynomial  functionals  of  a  Gauss-Markov  process  observed  in 
white  noise,  and  Hida  and  Kallianour  [6]  use  multiple  integrals  to  predict  non-linear 
functions  of  Brownian  signals  given  perfect  observations.  In  cumulant  approximations  of 
the  conditional  density  in  filtering,  Eterno  [1]  also  derives  expressions  using  multiple 
integrals.  Here,  we  seek  to  apply  multiple  Integrals  of  the  form 


a(t,s1 


,s  )dy(s  )«**dy(s  )  , 

n  n  i 


where  a(...)  is  deterministic,  to  the  general  filtering  problem,  we  focus  on  two  basic 
issues;  the  expansion  of  the  optimal  filter  by  expressions  involving  multiple  integrals, 
and  the  construction  of  best  suboptimal  filters  having  a  finite  multiple  integral  expansion 
of  specified  order.  It  is  important  to  observe  that  the  stochastic  integrals  we  employ  are 
formed  from  the  observation  process  and  not  the  innovations  process.  At  first.  Integration 
against  innovations  might  appear  to  be  an  attractive  idea  because  the  innovations  process 
is  Brownian,  integrals  of  different  orders  are  thus  orthogonal,  and  homogeneous  chaos 
theory  can  be  applied.  However,  in  practice  the  innovations  process  is  not  available  since 
its  construction  requires  the  estimate  E{h(x(t))  I  F^},  to  compute  which  is  generally  a 
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difficult  filtering  problem  itself.  Integrals  using  y(»)  directly  are  thus  more  natural, 
but,  due  to  their  more  general,  usually  non-Gausslan  character  are  more  difficult  to 
apply.  For  example,  in  suboptimal  estimation  one  might  like  to  project  random  variables  on 
a  sum  of  spaces  of  multiple  integrals.  This  is  easily  done  for  Brownian  Integrals,  using 
the  orthogonality  of  different  order  integrals  and  explicit  formulae  to  calculate  the 
integrands,  but  not  so  easily  for  more  general  integrals,  where  the  orthogonality  structure 
and  kernel  formulae  are  lost.  In  this  paper  we  describe  a  method  for  analyzing  y( •)  - 
based  integrals,  that,  in  particular,  allows  resolution  of  this  projection  problem. 

The  paper  is  organized  as  follows.  $1.2  Introduces  the  precise  filtering  model  we 
consider  and  recalls  the  Kallianpur-Striebel  formula  for  the  optimal  estimate.  K  central 
feature  of  this  formula  is  the  fact  that  the  y( •)  process  is  absolutely  continuous  with 
respect  to  Brownian  motion.  Transformations  of  measure  so  that  y( •)  becomes  Brownian 
will  be  an  underlying  component  of  our  analysis  of  y( • (-based  Integrals.  $2  is  a  self- 
contained  treatment  of  multiple  integrals  of  Brownian  and  observation  processes.  We  define 
multiple  stochastic  Integrals,  prove  technical  lemmas  for  later  use,  and  develop  some 
useful  properties  of  the  Integrals.  Of  particular  importance  is  the  multiplication  formula 
<  theorem  2.1),  which  shows  how  to  express  the  product  of  multiple  Integrals  In  a  multiple 
integral  expansion,  thus  providing  a  rudimentary  algebra  for  handling  expansions.  We 
present  the  applications  to  filtering  in  section  3.  In  $3.1,  we  show  that  the  optimal 
filter  can  be  represented  as  the  ratio  of  two  multiple  integral  expansions,  essentially  by 
expanding  the  Kallianpur-Striebel  formula.  $3.2  addresses  the  issue  of  finding  the  best 
(mean  square)  estimate  of  the  form 


Vfc)  +  fo  +,”+  /o  /q1  /o1"1  ar<tlV",r>dy(9r)’*‘dy(V 


By  combining  the  expansions  of  $3.1  and  the  multiplication  formula,  we  derive  a  system  of 
linear  integral  equations  for  the  kernels  In  ®^ect'  the  of  analysis  is 
to  transform  measures  to  a  space  on  which  y<  •)  is  a  Brownian  process  and  then  to  apply 


the  multiplication  formula  to  discover  the  effect  of  the  Radon-Nikodym  derivative  so 
introduced.  The  remaining  sections  apply  these  results,  first  to  rederivlng  the  Kalman 
filter,  second  to  finding  best  quadratic  filters. 

It  is  a  pleasure  to  thank  Professor  S.  K.  Hitter,  for  suggesting  this  problem  and  for 
inspiring  and  guiding  the  research. 

1.2  Filtering  preliminaries 

The  precise  filtering  model  to  be  considered  is  as  follows,  bet  the  underlying 
probability  space  be  denoted  P).  For  0  <  T  <  »,  let  {x(t)  I  t  @  tO,T)  }  be 

a  measurable,  real-valued  process  on  (n,F,P),  h(s,x)  a  Borel  function  on  (0,T]  x  R, 
and  w(t)  a  standard  Brownian  motion  on  (ft,  ,P),  such  that 


Set 


i)  w{  •)  is  independent  of  x(  •) 

ii)  E  fT  h2(s,x(b))ds  <  «•  . 

'  o 


y(t)  -  /*  h(s,x(s))ds  +  w(t)  t  6  [0,T]  . 

Such  a  process  y( •)  will  be  called  an  observation  semi  martingale. 

bet  f(tj  x(s),  s  <  t)  be  a  non-anticipating  functional  of  x(  •)  such  that 
Bf(t;  x(s),  s  <  t)  <  “,  V  t  8  10, T)  ,  and  define  Fy  :  «•  o{y(s)  |0  <  s  <  t}  and 
F*'y  :  -  a{x<s),  y(s)  |0  <  s<  t}. 

Theorem  1,1  (Kalllanpur,  Striebel  (9J ) .  bet 

~  “  exp  (-  h( s,x( s) )dw( s>  -  j  h2(s,x(s))ds) 

Then  (1)  Pn  is  a  probability  measure,  and  P  and  PQ  are  mutually  absolutely 
continuous. 
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(li)  VdP~  1  t'Y}  "  expt^0  h(8,x(«))dy(«)  -  l  fl  h2(s,x(s>)dsj. 


(ill)  On  (ft,P  ),  y ( • )  is  a  Brownian  motion  independent  of  x(«). 

(iv)  x(«)  has  the  same  Law  on  (fl,PQ)  as  on  (ft, P). 

(v)  E{f(t;x(s),  s  <  t)  |  F^} 

E0(f()x(s),s<t) 

-  - — - - -  .  (1.2) 

For  a  nice  treatment  of  this  theorem,  see  Wong  [21].  It  is  the  principal  theoretical 
tool  for  our  work  in  filtering,  for  it  explicitly  characterises  the  optimal  filter  as  a 
functional  integral  and  it  establishes  that  y( •)  is  mutually  absolutely  continuous  with 
Brownian  motion. 

Finally,  we  remark  that  we  restrict  ourselves  here  to  scalar  processes  only  in  the 
interests  of  notational  simplicity.  The  techniques  to  be  discussed  extend  easily  to  the 
vector  case. 


2.  Multiple  Integrals 
2. 1  Definitions 

The  concept  of  a  multiple  Wiener  integral  derives  ultimately  from  Wiener’s  work  on 
’homogeneous  chaos*  decompositions  of  functionals  of  Brownian  motion*  however,  the  modern 
definition  and  theory  are  due  to  Ito  [7] .  Here  we  will  define  multiple  integrals  by 
iteration  of  stochastic  integration.  While  this  differs  from  Ito’s  construction,  it  leads, 
as  Ito  (7)  notes,  to  the  same  result  modulo  a  multiplicative  constant.  The  iterative 
definition  is  convenient  for  our  calculations. 

Imt  (b(t),F^)  be  a  standard  Wiener  process  with  its  associated  family  of 
O-algebras  F^  *  a{b(s)  :  a  <  t}.  Recall  that,  for  a  Jointly  measurable,  F^-adapted 
process  $(t,w)  such  that  E  $2(s)ds  <  <»,  the  Ito  integral  $(s)db(s)  has  the 
properties 


♦ 

!•* 


,*r 


E  f*  4>(s)db(s)  -  0  t  <  T 


E(/Jj  $(s)db(s))2  -  E  /£  $2(s)ds  t  <  T 


L2{ (0,T] r)  -  {f  e  L2([0,T]r)  |  f  is  symmetric} 

This  will  be  the  set  of  integrands  for  the  rth  order  integral.  If  f  e  L2([0,T]r), 

2  r-1 

f(o, ...)  6  L  ( ( 0  ,Tl  ),  will  denote  the  section  of  f  at  o. 

Definition  2. t  Let  f  e  L2([0,T]r)  t  <  T.  l£(f)  is  defined  recursively  by  (L2([0,T]°) 


It(f)  •  f  for  r  »  0 


if (f >  -  f*  Ir_1(f( . . . . 

t  ■'OS 


I*(f)  is  the  rth  order  multiple  Integral  of  f  with  respect  to  b( •)  up  to  time 
t.  Alternately  stated. 


&*>-£  fo'-fo*’1  «v 


,sr>db(sr>...db(s1) 


To  insure  that  the  rigjit-hand  side  of  (2.3)  is  well  defined  it  suffices  to  show  that 

—  I  2 

Ir-1  (f  ( 8 , . . . )  )  has  a  jointly  measurable  version  with  bounded  L  (SI  x  [0,T],  P  *  \)  norm, 
s 

This  may  be  done  by  proving  recursively,  along  with  the  definition,  that 
EI^(f)I^(g)  -  py  (f,g) 

(2.4) 

-  H  . sr)dsr...dSl 

for  all  f,g  e  L2((0,T]r).  This  is. a  consequence  of  (2.2).  Then,  if  fn  is  a  sequence  of 
symmetrizations  of  separable  functions,  such  that  fn  *  f  in  L2-norm,  I* (f n(s, . . . ) )  is 
jointly  measurable  for  all  n  and 


ft  [I*_1(fn(s,...))  -  -  0  . 


Thus  we  can  find  a  jointly  measurable  version  of  Ig  (f(s, ...)). 

It  is  important  to  note  that  multiple  integrals  have  zero  man  and  that  integrals  of 
different  orders  are  orthogonal!  that  is,  for  f  e  L2((0,Tlr),  g  6  L2(  [0,Tl**),  q  ^  r,  t. 


El‘(f>  -  0 

(2.5) 

Efl*(f)l^(g)!  -  0  . 

These  follow  from  repeated  application  of  (2.1)  and  (2e2># 

Remark.  The  requirement  of  symmetry  for  the  integrands  is  not  necessary,  since  integration 
is  carried  out  only  over  the  set  where  s^  >  s^  >•••)  s^e  However  this  convention  Is 
convenient  in  formulating  the  multiplication  formula  in  section  2.2. 

The  following  technical  lemma,  a  Fubini  result  on  interchanging  db  and  ds 
integrations,  is  needed  later. 


Lemma  2.1  Let  f  e  l2([0,T]r).  For  t  <T 


J*  I^_,(f(s,...)ds  *  /J.../Qr'2  /*  f(u,s1,...sr_1)dudb(sr_1)...db(s1)  .  (2.6) 


Proof  Define 


g.(s,,...s_  ,)  »  /*■  f(u,s  ,...s  ,)du.  The  r.h.s.  of  (2.6)  is  I*  (g  )• 

t  i  r- 1  '8  i  r- 1  v  i 


To  prove  the  lemma,  simply  verify  that 


if/*  Ir_1(f(s,...))ds  -  1 1 ( 9 .  >  J  2  =  0 
■'os  r  r 


by  using  the  basic  properties  (2.5)  of  the  multiple  stochastic  integral. 


For  filtering  applications,  we  must  also  define  multiple  Integrals 


Jg  •••fo*"1  f(s1,-.-,sr  )dy(sf)  •••(dy(s1) 


with  respect  to  observation  semi-martingales 


4 

4 

* 

* 


y(t)  “  /g  h(x  )ds  +  w(t) 


(the  assumptions  of  section  1  are  assumed  to  be  in  force).  Such  integrals  are  known  and 
have  been  studied  in  the  context  of  semi-martingale  theory.  However,  the  special  structure 
of  (2.8)  allows  a  simple  definition  which  we  present  here.  This  takes  advantage  of  the 
absolute  continuity  of  the  y( •)  process  with  respect  to  Brownian  motion;  as  stated  above, 
if  (ft,  F,  P)  is  the  underlying  probability  space,  there  exists  a  probability  measure  PQ 
such  that  PQ  <<  P,  P  <<  P q ,  and  y(  •)  is  Brownian  on  (0,  F,  PQ).  Therefore,  for 
f  e  L2([0,T]r),  we  define  (2.6)  as  the  random  variable,  which  on  (0,  F,  Pg)  equals  the 
Brownian  motion  integral  defined  above.  We  call  this  integral  I^(f)  without  reference  to 
measure  or  process,  which  should  always  be  clear  from  context. 

The  iterative  property  of  I*(f)  remains  true  for  dy  integrals;  that  is. 


I*(f>  -  Jg  I*-1(f(s,...))dy(s) 


where  the  Integral  in  (2.9)  is  defined  with  respect  to  the  semi-martingale  y( •)  in  the 
usual  sense  (see  Liptser  and  Shiryayev  (11]).  However,  neither  the  expression  (2.4)  nor 
the  orthogonality  of  different  orders,  (2.5),  now  holds.  Instead,  we  can  prove  the 
following  lemma,  which  is  useful  in  section  3.2.  (In  this  discussion,  we  abbreviate 
h(s,x(s)>  by  h(s)4 

Lemma  2.2  Suppose  Ef h2(s)ds)r  <  <*.  Then  for  k  <  r  and  f  e  L2((0,T]k) 

(i)  E[l£(f)l2  <  Mklf|22;  <  »  is  independent  of  f 

la 

(ii)  E  I*(f)  »  /g  •  ••/g'*  1f(sf , . . .  ,8^)  E(h(s1  )  •••hfs^)]  ds^  •  •  •ds1 


Proof 


We  will  actually  prove  by  induction  the  more  general  result:  for  r  >  4  >  k 


et0,T1 


E[x(ot)..-x(Vl)I^(f)]2  <  »U(V,---V,f|2 


(2.10) 


where  a  .  e  l'((0(T)*  S,  and 

1C 


E[h(oi).**h(ck+1)I^  (f)) 
K 


°k  ,S1  ,9k-1 


(2.11) 

!0K  /0 1  *  *  *  /0  K  Eth(s1)*»h(8k)h(ak+1)**h(o)l))d8k.*d31  . 

Lemma  2.2  is  the  case  £  «  k  for  every  k  <  r.  First  we  demonstrate  (2.10)  and  (2.11)  for 
r  >  l  >  k  «  1,  using  the  iterative  formula  of  (2.9)  and  the  Independence  of  x(  •)  and 
w(«).  Thus 


E(h(ot)--h(a2)  /p^f  ( s)dy(  s)  ]  2  * 

°l  s 

E(h(o  >*«h(<j2){/0  f(s)h(s)ds  +  fg  f(s)dw(a))r  <  (2.12) 

(2E  /jj  (h(  Oj)  •  *h(  s)]  2ds  +  2  E(h(  a^) 2  •  »h(  a2> 2]  )lf«2  =■  3 1(  i  <  V  *  "V  ,f  ^  ' 


To  derive  the  Inequality  in  (2.12),  the  Cauchy-Schwarz  inequality  is  used  several  times. 
»t  1  c  L^fO.T}1  ')  far  l  <  r  because  E  ( h2(s)ds]r  <  Likewise 

°1 

E(hCcJl)*.h(c2)  /Q  f(s)dysJ  - 

a  a.  o 

E[h(  o^)  »«h(  a2 )  [f0  f(s)h(s)ds  +  fg  f(s)dw(s)l)  »  J0  f  ( s)E  (h(  o2>  •  *h(  c^))  ds  . 


(2.13) 


Now  suppose  (2.10)  and  (2.11)  are  true  for  a  fixed  k  and  all  l,  r  >  t  >  k.  Again, 
using  ij^'lf)  «  Ik  (f  ( s,  •  •)  )dy(s) ,  Cauehy-Schwar  z ,  and  induction 


Elhl.y-Mo^j)!^1  (f)]2  < 

2  1^'  /o'"  E(h(cr4).*h<ak+2)h<s1)I^  (flSj.-on^ajda, 


+  2  /0k  Eth(oe)-*h(o)c+2)lk(f(s,..1)12ds 


*  12  fo  ai,k<s'0k+2'**'Vd8  +  J,H,k,V2',,V],f,J 


at,k+1(<W' V,f' 


By  induction,  a^  fc+1  e  L1  ( (0,T]  *~k_1 ).  Thus  (2.10)  is  true  for  k  +  1.  That  (2.10)  holds 
for  k  also  implies 


E  Ik(f(s,  ••)  )ds  < 


Thus ,  from  (2.1), 


E  /!"  Ik(f  (s,  •  •)  )dw(  s)  »  0,  for  t  <  T  . 
'0  s 


With  the  aid  of  this  equality  we  can  prove  that  (2.11)  also  is  true  for  k+1.  This 

completes  the  induction  step.  Induction  stops  at  k  m  r  since  we  have  required 

,T  2  r 

r  >  t  >  k  in  order  to  apply  E( JQ  h  (s)ds)  <  «. 
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2.3  The  multiplication  formula. 

As  above  let  ( b( t)  ,F denote  a  standard  Brownian  motion.  If  ^(b(s),  s  <  t)  is  a 
functional  of  b(«)  up  to  time  t,  we  want  to  consider  expansions  of  the  form 


*  -  l  • 

r-0 

(If  e  L2(J5,F^,P)  such  a  representation  exists,  uniquely,  and  the  series  converges  to 
ip  in  mean-square;  see  Ito  {7}  or  Hida  [5].)  Rules  prescribing  how  this  representation 
changes  as  various  operations  are  performed  on  must  be  available  if  multiple  integral 
expansions  are  to  be  of  use  in  applications.  In  this  section,  we  address  the  simplest 
problem  in  this  direction.  If  f  8  £2((0,T]r)  and  g  e  L^tO.T]*1),  what,  if  any,  are  the 
kernels,  such  that 

I*<f)Iq(g)  -  l  I*uS  (t  <  T)  7 
c  c  i-0  * 


To  express  the  answer,  we  first  introduce  the  following  notation. 
Definition  2. 2 


(1) 


(ii) 


Pr  S  projection  of  L2([0,T]r)  onto  i,2([0,Tjr)j 

(Prh)(ot,..,^)  -77^  ht%(i,',-'0s(r)) 


r 

where  Sr  -  permutation  group  on  r  letters. 
For  integers  r,q,k,  0  <  k  <  min(r,q),  and 
f  e  L2((0,T]r),  g  e  L2((0,T]q> 


(f  ■k(t)g)(c1,..,cr+q.2k> 


=  77  /o  "Jo  f(V'VV'°r-k>  q,VV  0r-k+l',,'<W2k)<V*<,81 


(iii)  f  g  aP^^tf  .k(t)gl 
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<iv)  f  9  g  -  f  ®Q  (t)  g  -  P  [f(  ot ,  ••»qr  )g(  Of+t ,  a  )] 


9^(  t)  is  the  operation  by  which  new  kernels  are  created  from  old)  indeed. 


f  9k(t)g  s  L2([0,T]r)  xl2t(0,Tlq>  L2(  tO,T]r+<I“2k) 


as  the  following  lemma  demonstrates 


Leimpa  2.3  For  every  t  <  T 


In  fact. 


f  »k(t)g  c  L2( (0»T1  r+q-2k) 


If  CJt)gl2  <  c  .  If  I2  Igl2 
K  r , k 


where  cf  k  is  independent  of  f  and  g. 

Proof  It  suffices  to  prove  the  lemma  for  a,  instead  of  9,  since  pr+q_2k  18  8  bounded 
operator.  Let  do  “  doi  **don<j-2k'  ds  ”  d8l‘,dl,k*  th8n  have»  using  the  Cauchy-Schwarz 

inequality 


,r+q-2k  (kl)‘  [0,T] 


<  Ifl2  Igl2 

(kl) 


To  understand  the  meaninq  of  9^(  t) ,  it  is  useful  to  think  of  the  functions  f  and 
g  as  tensors,  which  they  in  fact  are  by  the  isomorphism 

L2UO.T]r>  -  L2({0,T]>  ■  •••9  L2([0,T])  (r-fold)  . 

Then  f  a^(t)g  may  be  viewed  as  a  tensor  contraction  since  it  'sums',  that  is, 
integrates,  f  and  g  along  the  first  k  indices.  Thus  f  9fc(t)g  is  simply  a 
symmetrized,  k-fold,  tensor  contraction.  It  is  in  this  definition  that  the  symmetry  of 
f  and  g  is  used)  otherwise  9(t)  would  have  a  more  complicated  definition.  For 
notational  convenience,  we  shall  often  write  9^  instead  of  9^ ( t ) ,  in  which  case  the 
(t)  is  to  be  assumed.  When  the  time  parameter  is  important  or  different  than  t,  it  will 
always  be  given. 


I 


vvT'  . '  ... 


.  -  .. 


*•  -ri 


W«  can  now  state  the  result 

Theorem  2.1  Let  (  ei2((0,Ur),  g  e  L2(  [0,T] q) .  Then 


mln( r ,q)  ,  . 

I*(f)  x’(g)  -  I  I^q  ((rr2)t2,')fk  9  (t)g  )  . 

k«0 


(2.14) 


Remarks  1.  (2.14)  shall  be  referred  to  as  the  multiplication  formula.  Hi  da  has 

independently  derived  this  result  as  an  application  of  his  theory  of  generalised  Brownian 
functionals  (personal  communication  of  T.  Hi da;  for  generalised  Brownian  functional  theory, 
see  Hida  (41).  Our  proof  is  elementary,  using  only  Ito's  differentiation  rule.  For 
similar  theory,  see  also  Meyer  [15].  Versions  of  this  formula  are  also  known  in 
mathematical  quantum  field  theory  (Reed,  Simon  [19]).  See  Mitter  and  Ocone  [17]  for 
further  comments. 

2.  The  multiplication  formula  generalises  a  Hermite  polynomial  lndentity.  The  nth  order 
Hermite  polynomial  of  a  single  variable  is 


Let 


.  ,  .  (-1)n  2,„  d"  2  „ 

h  (x)  -  — 7- —  e  x  /2  -  -  x  /2  . 

a  rn  e 

dxn 


°r  “  {It<f)  1  f  8  ^2(  I0't]r>) 


and  let  (4  }"  be  an  orthonormal  basis  of  L2([0,t]).  Then,  (Ito  [7],  Kallianpur  [8]) 

n  n“  1 


_  ■«  I  P,  +  •••*  p  -  r  , 

G  -  Sp[  n  h  (/„  ♦.  (s)db(s))  .  .  .  ,} 

r  0  1  I  are  pairwise  unequal 


where  Sp  denotes  the  closure  in  L  (P)  of  the  linear  span.  One  then  sees  that  (2.14) 
generalises  the  identity,  ([12]), 


mln(r,q) 


hr(x)h(xi«  i  (  <*><*> 


y2 


k-0 


r+q-2k 


(x) 


(2.15) 
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V  r,q  >  0.  There  is  a  discrepancy  between  (2.15)  and  (2.14)  in  the  factors  multiplying  the 
expansion  terms,  but  this  is  due  to  the  different  normalizations  involved  in  the 
definitions  of  Ir  and  0.  The  relationship  between  (2.14)  and  (2.15)  may  be  seen 

clearly  in  Hida's  work,  but  we  shall  not  pursue  the  matter  further  here. 

We  will  show  how  to  prove  theorem  2.1  using  ito's  rule  and  Induction.  For  this 
purpose,  we  need  certain  facts  and  identities  concerning  0,  and  these  are  collected  in 
the  next  lemma.  The  notation  f (S,, .. ,8^, ...)  indicates  the  section  of  f  in  which  the 
first  k  variables  are  fixed  at  s s^  respectively. 

Lemma  2.4 

(1)  f  ( o, .  ••)  Ok(c1)g  (<y)  <  V**'°r»q-2k-1)  e  L2(  £0,T)  r4'q“2k) 


(ii)  f  Ok(t)g  -  f  Ok(o)g  +  /*  f(s,-*)0K_1(s)g(s,..)ds 
(ill)  For  k  >  1,  (f  0K(t)g)(o1#  ,**0r+q_2it> 

■  f<  v>  Vt)q  +  f  vt)9(v)5 


‘V'Vq-lk* 


(2.16) 


(2.17) 


(iv)  (f  0(t)g)(c,,*«,c  .  )  -  (2.18) 

i  rtq 

[^((^••IWtlq^f  OltlqlOj,")]  to2' ”'°r+q-2k)  * 

Proof  i)  follow  from  calculations  similar  to  those  in  lemma  2.3.  The  details  will  not  be 
presented. 


ii)  By  direct  calculation  and  definition,  using  the  symmetry  of  f  and  g  extensively. 

f  Ok<t)g 


-  P 

r+q-2k 

•  P 

r+q-2k 

m  D 

r+q-2k 


(-  f4  ••  J* 

lk!  J0 


tf  r'1— 

lJo  Jo 

c—  f°  fa  •• 

lk I  J0  J0 


ds1*«dsk  f(s),»»,sk,»*)g(s1. 


f’  f(e1.»*,ak,*»)g(s1,**,sk. 


•)dsk  ••dS1) 
•  )  dSj^  •  «ds  ^  1 
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I 


+  Pr+q-2k[(i^TT7  £  *  /o1"  C'1f<V**'V**>«<V*,'V**>dV,,U11 

-  f  »k(o)g  +  J*  da  f(s,  •  •)  «k_1(a)g(a,**) 


(ill)  and  (lv).  The  proofs  of  (ill)  and  (iv>  are  aiadlar,  (iv)  being  just  a  special  case 
of  (iii).  He  shall  only  present  (iv),  as  it  is  simpler.  Note  first  that,  by  definition, 

«,»••>  ®  <t)gl  ((^.••.o^) 

”  r+q  (r-Nj-l)l  J-  f<0r0*(2)'*‘°*(r))  9<  °i(r+1 ) '  *  *' ®«(r+q)  *  (2.19) 


where  »  e  S  .  is  interpreted  as  a  permutation  of  {2,»»,r+q}.  Now  using  the  syaawtry 
r+q- 1 

of  f,  (2.19)  may  be  written  as: 

,  r 

Tr+5fr  A  J  f,,W'ag(J>'  °1'B.(J+1)*,*'0s(r)>  * 


j“’  we8r+q-1 


(2.20) 


9<  0t(r+1  >'**'%(  r+q) )  * 


Using  the  expression  analogous  to  (2.20)  for  f  a  (t)g(  o1  ••), 


{-£-  f(o,  ,••)  9(t)g  +  f  «(t)  g(o  ,  ••)}  (ol,»»,  c  ) 
r+q  1  r+q  i  4  r*q 


l  l  °ir(2) '  **' ai' **' °s(r) *  *  °w(r+1) '  **' °ir( r+q) * 

j-’  "<=sr*q-, 

+  A  J  f  ( °x(2 ) '  *  *' °w(  r+1 )  ^  *  ^  °w(  r+2) '  *  *' °1 '  *  *' °w(  r+q)  ^ 
j*’  *eSr+q-1 


Tr^qTT  J-  ,<0x(1)'**)9(%(r+1>'-'a»(r+q)) 


■  f  ®(t)g(  a, ,  **,0 

1  r+q 
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This  is  ths  desired  rssult 


Proof  of  theorem  2. 1.  Ms  use  rto's  differentiation  formula  and  the  preceding  lemmas  to 
implement  an  induction  argument  that  proceeds  in  two  steps i 

(a)  Show  (by  induction)  that  (2*14)  holds  for  orders  r  «  n,  q  ■  1,  Vn. 

(b)  Assuming  (2.14)  for  (r-1,q),  (r,q-1)  and  (r-1,q-1),  show  that  it  holds  for  (r,q). 
(a)  and  (b)  then  provide  a  consistent  scheme  of  induction  for  proving  theorem  2.1  for  all 
orders. 

Step  (a)  By  Ito's  differentiation  rule 


f(s)db(s)  fjj  g(s)db(s)  -  /jj  /0’  (f(s1)g(s2)  +  f(a2)g(s1  )]db(s1 ) 
+  f(  s)  g(  s)  ds  . 


This  proves  the  case  r  *■  q  *  1. 

Suppose  that  the  theorem  is  true  for  (r,q)  “  (n- 1,1)  and  let  f  t  L2(I0,T]n), 
g  e  L2((0,Tl).  Applying  Ito’s  differentiation  rule  again. 


l£(f)l’(g)  -  g(s)  lj|(f)db<s>  +  /*  i“_1  (f(s,..))i’<g)db(s) 
+  fg  l^_1<g(s)f<s,  ••))ds  . 


(2.21) 


By  induction, 

In-1(f(s,.*))ll(q)  -  l”(n(f(s,  ••)  0  g]>  +  l""2(f(s,..)  O  (s)g)  . 

i  8  8  a  i 

Lemma  2.5(1)  and  lemma  2.1  justify  Interchanging  integrations  in  the  last  term  of  (2.21): 


ft  l™"\g(e)f(s,.0)ds 


Thus,  by  substitution  in  (2.20) 


l£(f)  l’(g)  -  /g  {l|j<g<a)f<*.)>  +  l]|<n(f(a,  ••)  0  g])}db(a) 

+  /g  I^”2{f(s,  •»)  01  (s)g)db(s) 

+  l"  1 1  g(u)f(u,«1 ,  )du) 


-  l£  {g(c1)f(<»2.**«on)  +  ntf  ( o1 .  og<  «1 )]  (o2»  *‘.on) } 

+  l"  el<a1)g(o2,  ***on>  +  /g  g(»)f(8,  •♦,an_1)d»}  . 


By  lemma  2.5  (ill)  and  (iv)  this  becomes 

l"+1  ( ( n+1  )f  0  g>  ■*■  l£-1  (f  01(t)g)  , 

which  completes  the  Induction  step  of  (a). 

Step  b  Without  loss  of  generality  assume  that  q  <  r.  The  Induction  hypothesis  is  that 
theorem  2.4  is  true  for  (r-1,q),  (r,q-1),  and  (r-1,q-1).  Apply  Ito’s  differentiation 


It<f)It<9>  ”  Jo  1s<9)1s"1(f(8'**>)ab<*> 

+  /g  I^"1(g)r'(f)db(s) 

+  /g  i*  1(f<s,*»))i^"1(g(s,.»))ds  . 

Next,  use  the  induction  hypothesis  to  expand  the  integrands  in  (2.22),  then  interchange 
da  and  db(s)  integrations  where  necessary,  and  collect  like  order  terms.  The  result  is, 
for  q  <  r 

i£<m’<g> 

”  {t(r^11)lf(s1,»»>  o  gl  +  i1*9"1)^  e  g(s1,**)ll(s2,*»,s  )} 


j,  Cra«rT!?»  >«v”>  v-,'->H-2-'V,-2k> 


>  [f  Ok,.1,g(s1,..))(.2,...sr+q.2)c, 


♦  <r^]c21')  Jg  f(u,  ••)  e)t_1(a)g(u,  ••)du} 


+  X 


^  q{{f<«1,  «  <«1>g]  <*2/ ♦•«■  )  +  /g  f(u,  ••)  O  ^(u)g(u,  »»)du) 


To  complete  the  proof,  we  need  only  apply  the  Identities  of  lemma  2.4  (111)  and  (iv)  to  the 
kernels  of  this  last  expression.  For  example,  the  kernel  of  I^+q_2k,  1  <  k  <  q-1 ,  equals 

ir+r-k2k)  <f(V>  V"l>*><V>  +  Cf  ‘V> 


+  (/g  f(u,  ••)  °K<u)g<u,  •*)du)(s2,  ••)) 


l(f  0R  (i,)?)!*,,11,*  2]()  ♦  ( Jg  f(u,»«)  ^(ujqtu,  ••)du)  (s2,**) 


*  C?(f  Vt)q)<V"'Va-2k>5  • 


This  is  the  kernel  given  in  (2.14).  The  kernels  of  I^*1*  2*/  k  «  0  and  k  »  q  are 
treated  similarly.  This  completes  the  proof. 


3.  Multiple  Integral  Expansions  in  Filtering  Theory. 

This  section  explores  the  use  of  multiple  integral  expansions  for  optimal  and 
suboptlmal  filtering.  The  estimation  problem  considered  is  the  general  problem  stated  in 
the  introduction,  and  the  notations  and  assumptions  established  there  shall  remain  in 
force.  For  additional  notational  convenience,  we  let  f(t)  :  -  f(t;  xg,  a  <  t),  h(s) 

:  -  h(s,  x(s))  and  ffc  :  -  E{f(t)  |  F*}. 

3. 1.  Expansion  of  the  optimal  filter 

A 

In  theorem  3.1  below  we  derive  an  expression  for  ft  as  a  ratio  of  two  multiple 
integral  expansions  in  which  the  process  of  integration  is  y(t),  the  observation  semi- 
martingale,  and  the  integrands  are  deterministic  functionals  computable  from  the 

-IS¬ 


C' - s - - - 


.  '-*-v»»»«re*»A«~wv  »♦-*»»* 


(unconditioned)  dietribution  of  the  signal  process.  First  we  state  some  preliminary 
definitions  and  a  lemma, 
bet 

Lfc  t  -  exp( J*  h(s)dy(s)  -  Vi  Jg  h2(s)ds]  . 

Lt  is  the  important  process  in  this  calculation.  Observe  that  and 

0 

(Lt#  F*  v  F*)  is  a  martingale  on  (0,  F,  Pfl),  (F*«  “  o{x(s);  s  e  R+)>.  A  conditioning 
argument  then  shows  that  the  Ka 11 lanpur-atr label  formula*  (1.2),  can  be  expressed  as 


Vf(t)lt  1 
f  - - — 

vs 1 

The  following  process,  based  on  L^,  will  also  appear t 


(3.1) 


,  <  r) 


H  *“< 


1>....h(sr)dy(sr)...dy(s1) 


Note  that  L* r*  la  not  a  multiple  Integral  of  the  type  defined  in  §2  since  the  integrand 
is  not  deterministic.  L^r*  may  be  properly  defined  by  noticing  that 

.  (r)  ft  .  . 

I*t  -  ]g  h(s)Lg  dy(  s)  . 

Iterative  use  of  the  stochastic  Ito  Integral  then  specifies  L^r*  for  any  order  r.  This 
is  especially  easy  to  carry  out  on  (0,  F,  PQ),  on  which  y(t)  is  a  Brownian  notion 
Independent  of  the  signal  (see  theorem  1.1). 

The  following  stochastic  Pubinl  theorem  for  interchanging  conditional  expectation  and 
stochastic  Integration  is  needed;  it  is  a  direct  consequence  of  theorem  5.14  in  Llptser  and 
Shlryayev  (11). 

Lemma  3. 1  Let  *(  s)  be  a  Fxy  adapted  process  such  that 
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r*Ar  V • 


vG  <  *  • 

Than  E{^  *(s)dy(s)  |  F*}  -  /*  Efl{*(s)  |  F*)dy(s). 

Finally,  it  ia  convenient  to  introduce  the  functions 

i  (t,  s,,...,s  )  i  -  E{f  (t)h(s  )...h(s  >}  n  >  0 
n  l  n  in 

k  (s.,...,s  )  t  -  K {h(  s  ) . • .h(  s  >}  n  >  1 

n  i  n  i  n 

kQ  :  -  1  . 

Theorem  3.1 

i)  (Partial  expansion)  If  *(/g  h2(o)do)r  <  ”,  and  E[f2(t )(/*  h2(o)do)rl  <  •», 

then 


l  I^n)<tB<t)>  +  E„{f(t)  L**’  |  F 

n-0  _  _  _  _ 


tr>  i  rn 


(3.2) 


l  Itn’,kn)  +  Vf,t)  Ltr>  >Ft} 

n-0 


ii)  (Full  expansion)  If  E(exp  h2(s)ds)  <  ”,  and  E[f^(t)  exp  h4(s)ds]  <  » 


■t  .2, 


I  x£n>cyt» 


i  _  n-0 
rt 


(3.3) 


l  Itn,(kn> 

n-0 


and  the  expansions  converge  in  L1  (P) . 

Proof i 

Part  i)  By  applying  Ito's  differentiation  rule  to  , 


dLg  -  h( s )L(du( s) 


so  that 


Lt  «  1  ♦  h(s)Lgdy(s)  . 


(3.4) 
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Iterating  (4.4),  we  find  that  for  any  r 


\  «  1  +  h(s)dy(s>  *  /01  h(e1)h(»2)dy(.2)dy(»1) 


+  « •  ••  +  L, 


Mow  substitute  this  expansion  into  th.  Kallianpur-Striebel  formula  3.1  for  ft.  The 
denominator,  for  example,  becomes 


BOlLt,Ft}  “  1  +  l  Eot/o***^on1h<,1)*“h<sn)***dy<81)IFt} 

n«1 

+  *o{Ltr>1Ft}  * 


The  hypothesis  £[/*  h2(s)dalr  <  »  of  (i)  allows  lenssa  3.1  to  be  applied  to  the  terms  of 
(3.5),  with  the  result. 


*0[l-tlFyl  -  1  +  l  /g***/0n-1B0{h<s1)  •••h(an)}dy(sn)  •••dyts^  +  E{L^r)|Fy} 


Since  the  distribution  of  the  signal  process  is  invariant  under  the  change  of  measures 


from  P  to  P„ 


Therefore 


E_{h(s  )••*»>(■  )}  »  E  {h(  s  )  •  •  *h(s  )} 
u  1  n  in 


»  k  (s  ,...,s  ) 
n  1  n 


-  I  l‘n)(kn)  +  E{L[r)|Fy}  . 


h  similar  calculation  yields 


EQ{f(t)LtIFy>  -  l  l£n)U  (t))  +  E {f ( t )L^r)  IFy } 

n«0 


Substitution  of  these  expressions  into  the  Kallianpur-Striebel  formula  then  proves  (3.2). 
Pert  ii).  Formally,  the  proof  of  the  full  expansion  follows  by  setting  r  •  >.  To  prove 
it  rigorously,  we  first  show  that  E  exptj^  h2(s)dsl  <  •»  implies 


m.s.(P0)  lim[  1  +  1  /£•••/, 

n-1 


n-1 


h(s  )  •  •  »h( s  )dy( s  )***dy(s,)J 
i  n  n  I 


(3.6) 


Denote  the  finite  series  on  the  right  hand  side  of  (3.6)  by  a”.  Then 


WAt>2 "  Vfo"*Ch(8i)*‘*h<sN+i>Ls  dy(8N+1>*‘,dy(81)]' 

N+ 1 


By  employing  the  standard  computational  rules  (2.1),  (2.2)  for  stochastic  integrals,  this 
last  expression  equals 


s 

o"vh 


provided  that  it  is  finite.  However, 


E0(h2(s1)...h2<sN)L2 


N+1 


( s i )  •  •  *h^( s^j)  expl-/0N+1h2(s)ds)  E0(exp(2  J0N+1h(  s)dy(  s)]  IF*  ]  }  .  (3.7) 


Now  on  (Q,P  ),  x( •)  and  y(*)  are  independent  and  y(  •)  is  Brownian,  and  hence,  given 
s 

(x(s),  s  <  Sj),  J0  h(s)dy(s)  is  a  Gaussian  random  variable  with  mean  0  and  variance 

rSN+1.2,  _ 

h  ( s ) ds .  Thus 

S  S 

EQ (exp  2  /gN+1h(  s)dy(s)  IF*  1  -  exp  2  /0N','1h2(s)ds  .  (3.8) 
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Therefore,  using  (3.8)  in  (3.7) 


(3.7)  -  £0{h2(s1)*»*h2(slJ+1)  exp(/0M+1h2(»)ds3  } 


E0{hz(s1)...hz(aIl+1)  l  /0N+1/01**'/0  1_1h2  (o,  )*‘*hz(  oj)doj-*do1J  . 


As  a  result 


/o— /oR+lBolh2<8i)— h2(8Hti^ 

Nti 

“  I  /o*,*/oj"lEoCh2<8i)“*h2(8j)ld8j‘,*d8i 

1 


X  TT  Vfi-fi  h2(81>***h2{Bj)d8j*‘*d8i> 


(3.9) 


Since  E  exp[  J^h2  (s)ds]  <  «,  (3.9)  tends  to  0  as  N  ♦  proving  that  Lfc  - 
m.s  (PQ)  lim  AN  for  all  t  <  T,  as  desired.  Lemma  4.1  can  now  be  invoked  for  every 


order  n,  so  that 


VLtF£>  *  E0(ms.  lim  a|||F*) 

N 

“  m.s.  lim  B0{a"|F^} 
N-h* 


.s.(P  )lim[1  +  [  It(k„)) 

N-M  n-1 


A  similar  proof  expands  EQ  (f ( t )Lt | F^}  in  the  series 


V«  +  I  I"U">  * 

n-1 


Finally,  to  derive  the  L1 (P)  convergence,  note  that 


E  ]  2 
0  dPfl 


EqL2  ■  Etexp  /Jh2(s)ds]  < 
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(ElVLt!F*l  -  (1  +  I  ^l!) 


<B0  (^2  E0  K'S/t3  -  (1  +JI>n>))2 


1  N 

Thus,  from  (3.6),  E„[L,|F*]  -  (L(P))  lim  [1+7  l"(k  ))  as  claimed.  This  completes 

D  t  t  t  n 


N+»  rv»  1 


the  proof  of  theorem  3.1. 


Let  P(A,t|  F'f)  -  E[1  (x(t) )  |F^3  denote  the  conditional  distribution  of  x(t)  given 
tat 

the  observation  up  to  time  t. 


Corollary  3. 1  If  S[exp  /Th*(s)ds]  <  • 


p(  A,t|F*) 


El  ( x(  t ) )  +  7  I*(E1  (x(t))h(s  )  •••h(s  )) 

A  “  t  A  i  n 


1  +  l  l"  (Ehts^  ••.h(sn>) 
n»1 


A  related  formula  is  also  of  interest.  If  x(t>  has  a  density  q(x,t),  x(t)  has  a 
conditional  density  given  by 


E  (L(  t)  |  F^,x(  t)-x)  q(  s ,  t ) 
p(x,t|F*>--2 - 1 -  -  . 

^  _  r*  /  A.  \  I  rJ  t 


E0[L<t)|F^l 


Using  the  same  techniques  as  above,  we  can  easily  derive 


E0[L(t)|F^,x(t)-x}q(x,t)  -  (1  +  7  l"(K[h(s1)»**h(sn)|x(t)-x]  )) 

n^O 

*  q(x,t) 


(3.10) 


for  the  numerator  of  p(x,t|F^).  (3.10)  is  often  called  the  unnormal lied  conditional 

density. 

Remark:  These  results  all  have  an  obvious  generalisation  to  the  multidimenslnal  case. 

A 

The  Bayes  formula  (4.1)  for  f t  is  properly  viewed  as  the  ratio  of  two  conditioned 
functional  Integrals,  in  which  the  dependencies  between  x(»)  and  y( •)  are  linked  in  the 
I>t  term.  The  expansions  of  theorem  4.1  in  effect  calculate  these  functional  Integrals  by 
expanding  1^.  The  x(*)  and  y( •)  interactions  are  then  separated  in  the  sense  that  the 


.1 


calculation  of  the  filtar  ia  decomposed  into  two  parts*  first,  computation,  off-line  and 

prior  to  filtering,  of  the  kernels  l  and  k_,  and,  second,  stochastic  integration  of 

n  *» 

these  kernels  against  the  observations.  Of  course,  in  actual  practice  one  can  only  compote 
a  finite  number  of  terms.  In  fact,  if  the  kernels  are  separable  or  are  approximated  by 
separable  versions,  a  truncated  expansion  may  be  realised  in  a  finite  dimensinal  and 
recursive  manner,  because  a  stochastic  differential  system  can  be  constructed  to  realise 
any  multiple  integral  with  a  separable  kernel.  However,  caution  must  be  excerclsed  in 
approximating  the  optimal  filter  by  truncations  in  (4.3),  because  truncation  of  the  series 
in  the  denominator  can  be  a  source  of  severe  instability.  Although  B{L(t)lF^}  >  0  a.s.,  a 
truncation  approximation  may  pass  through  0  and  ro  lead  to  s  singularity  of  the  filter. 
Thus  an  Independent  estimate  of  the  denominator  is  in  general  required. 

Recently,  attention  has  focused  on  the  unnormalised  conditional  density  and  the 
corresponding  'unnoraalised1  conditional  moments,  which  are  just  the  numerators  of  the 
Kali lanpur-Str label  formula.  E.  Wong  [22]  has  given  a  class  of  Markov  signals  for  which 
analytic  expressions  of  (3.10)  are  available.  Again  truncation  of  (4.10)  will  in  general 
yield  functions  that  attain  negative  values.  For  this  reason,  cunnulant  expansions 


p(x,t|F')  -  e 


have  been  studied  as  an  alternate  source  of  approximate  filters  (see  Eterno  [1]).  We  will 
not  pursue  these  issues  further,  but  instead  turn  to  other  theoretical  developments  based 
on  therem  3.1. 


3.2  Beat  rth  order  filters 

Finite  sums  of  multiple  integrals  provide  a  natural  class  of  causal  functionals  for 
the  design  of  suboptimal  filters.  We  introduce  the  following  definition: 


Definition  3.1 


ii)  The  best  rth  order  estimate  of  t  ( »)  at  time  t  is  an  element  F(t)  8  Y^(t)  such 
that 

E( f ( t)  -  ?<t>>2  <  E( f ( t )  -  b(t))2  (3.11) 

for  all  b(t)  e  Yr(t).  The  kernels  of  f(t>,  denoted  by  Sglt),  a1(t),...,ar(t).  are 
called  the  optimal  kernels.  A  process  ?(t)  e  Y^ft),  t  <  T,  satisfying  (3.11)  for  t  <  T 
is  called  the  best  rth  order  estimate  of  f( •). 

Notice  that  the  best  1st  order  filter  is  simply  the  linear  filter,  and  thus,  in  the  context 

of  multiple  integral  expansions,  best  quadratic  (2nd  order),  cubic,  quartic,  etc.  filters 

are  the  natural  extensions  beyond  linear  filtering. 

In  this  section  we  characterize  the  set  of  optimal  kernels  as  the  solution  to  a  system 

of  linear  Integral  equations.  The  construction  of  these  equations  and  the  proof  of  their 

validity  utilize  the  expansion  formulae  of  theorem  3.1  and  the  multiplication  formula  for 

multiple  integrals  of  theorem  2.1.  Suppose  for  the  instant  that  the  full  expansion  (3.3) 

holds  for  the  optimal  filter  and  that  f(t)  -  J  I*(a  (t))  is  an  element  of  Yf(t),  not 

n-0 

necessarily  the  best.  If  f(t)  is  to  be  a  good  approximation  of  f(t),  we  want 


f(t)  -  f(t) 


l  lUf-At)) 

1-0  3 


or 

f(t)  l  I3(k  >  «  l  I3U  (t)) 

j-0  3  3-0  5 


(3.12) 


Now  notice  that  the  left  hand  side  of  (3.12)  can  be  rewritten  as  a  multiple  integral 
expansion  by  applying  the  multiplication  formula.  In  fact 


-t 

f(t)  l  I3(k  )  -  l  I3(g,(t)> 

j-0  e  3  n-0  3 


VU  ",  £  d2il)  *m(t)  0i<t)  kn 

J  (m,n,i)eC^ 


(3.13) 
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where  Cj  <  “  {(m,n,i)  |m*n-2i  -  J,  1  <  min(*,n),  ■  <  r).  Thu* .  on*  wey  to  pick  an 
approximation  f  would  be  to  chooae  the  kernel*  *n(t>  *o  that  9j(t>  match**  Vt> 
for  as  many  orders  j  as  possible.  In  fact,  this  is  a  prescription  for  the  optimal 
kernels. 

Theorem  3.2.  Assume  E(  j^h2(s)ds)2r  <  ••  and  E  f2(  t)  (J^h2( a)da)2r  <  »  . 


r 

Then  a  best  rth  order  estimate  exists.  It  is  given  by  f(t)  “  £  l"(a  (t)>  iff 

n-0  t  * 

gj(t,s? . . j )  »  E{f(t>h(s1>...h(aJ)} 

(  •  ij(t,s1,...,s^>) 


(3.14) 


for  0  <  j  <  r. 

Remark.  The  equations  at  (3.14)  comprise  r  +  1  integral  equation*  for  the  r  ♦  1 
optimal  kernels  a^(t)  0  <  j  <  r.  This  can  be  seen  from  the  definition  of  gj(t)  and  0 
and  will  be  illustrated  explicitly  in  the  examples  to  be  discussed. 

Before  proving  theorem  3.2,  we  first  establish  some  preliminary  lemmas.  The  first 
deals  with  existence  of  estimates. 

Lemma  3.2.  If  E( /Q  h  (s)ds)  <  ■»,  then  the  best  rth  order  estimate  exists  and  is  unique. 

k  2  2 

Proof  From  lemma  2.2  E[It(s)l  «  M^lal  j  for  k  <  r.  Therefore  Yr(t)  is  a  mean- 

L 

square-closed  (Hilbert)  space  of  random  variables.  The  lemma  follows  by  the  projection 
theorem. 

Of  the  next  two  lemmas,  the  first  introduces  the  optimal  estimate  to  compare 

suboptimal  estimates,  and  the  second  verifies  a  technical  identity. 

2  v 

Lemma  3.3.  Let  z,  v  e  L  (!?/ *,P).  Then 

E( z  -  f(t))2  <  E( v  -  f(t>)2  iff  E(  z  -  f(t>)2  <  E(v  -  f(t))2  . 

Proof.  Simply  note 


E(z  -  f(t))2  -  E(z  -  f(t))2  +  2E(  z  -  f(t))(f(t)  -  f  ( t) ) 
+E(f(t)  -  f(t))2 

-  E( z  -  f(t))2  +  E(f(t)  -  f(t>)2  . 
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Proof  of  theorem  3.2  Mcium  of  1< 


3.3  it  tuff too*  to  show  (3.14)  holds  If  and  only  If 


S(f(t)  -  f(t>]2  <  E[c(t)  -  f(t)]2 
for  all  c(t)  e  (t) .  Slnca 

*lc(t)-?<t)l2  -  E(c(t)-f(t)12  +  S[?(t)-f(t)]2  +  2B[c(t)-f(t)]  ( f(  t)-f(t)  J 
this  will  occur  If  and  only  If 

E[c(t)-f(t)]  (f(t)-f(t)]  -0  V  c(t)  tytl  . 

Thus,  ws  will  demonstrate  (3.20).  Begin  by  noting  that 


dP. 


«• 


-1 


(E, 


yiv-1 


Than 


E[c(t)-f(t)][f(t)-f(t)]  -  E {■ 


(c{t)-f(t))If(t)E0{LtlF^}-B0{f(t)LtlFj)] 


EtK(H-,F^1(c(t)-f(t))(f(t)E0(LtlF^]  -  *0{f(t)LtIF^J> 


E0{(c(t)-f(t))(f(t)E0tl.tIF][)  -  E0tf(t)LtIF^l  )}  . 


Next  note  from  (3.13)  that  g^(t)  depends  on  kernels  kn  of  at  most  order  j  + 


f(t>*o(*tl  t>  ■  lj0  + 

-  I  I*(g,(t))  +  l  l£(g  (t)>  +  f(t)E  {l.'2r>|F£} 
j-o  c  3  j-r+1  3 


where  the  (t) ,  r  *  1  <  J  <  3r  are  determined  by  the  multiplication  formula, 
partial  expansions  of  theorem  3.1,  we  then  see  that  the  expression  f(  t)EQ  {X<t  |  F^ 
E0(f(t)LtIF^}  appearing  In  (3.21)  equals 


(3.20) 


(3.21) 


r.  Thus 


Using  the 


<3.22) 


jr 

I  i*<g,<t>  -  t.<t)>  ♦  l  i*(g.(t>  -  t  <t>> 
j-0  3  3  J-r+1  3  3 

♦  *1  dAt))  *  ?<t>*  {i‘2r)|F*> 

j-2r+1  c  3 

-  K0{f(t)L^2r)|F^}  . 

Since  y( •  )  is  Brownian  on  (fl,PQ),  multiple  integrals  of  different  orders  are  orthogonal 
on  (fl,PQ),  and  so  if  (3.22)  is  ussd  in  (3.21)  we  find 

(3.21)  -  [c0(t)-a0(t)l  (g0(t)-i6<t))  + 


+  B0{(c(t)-f(t))f(t)*0(L^2r)|F^)  } 

-  B0((c(t)-f(t>)K0(f(t)t‘2r)|F^n  ,  (3.23) 

The  last  two  terms  of  (3.23)  are  zero  by  lemma  3.3.  Thus,  it  is  clear  that  (3.23),  and 
hence  (3.20),  is  zero  iff 

“  *j  0  <  j  <  r  . 

This  completes  the  proof . 

The  technique  of  theorem  4.2  extends  to  other  problems  as  well.  Suppose,  for 
instance,  that  a  filter 


a'(t)  -  a'( t)  +  \  I^(a;(t)) 
j-1  3 

of  order  q  is  available)  a'(t)  need  not  be  the  best  qth  order  filter.  Let  r  >  q, 
and,  rather  than  ash  for  the  best  rth  order  filter,  let  us  seek  the  "best  rth  order 
corection"  to  a'(t),  i.e.,  the  mean-square  minimising  a(t)  of  the  form 
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r  + 

a(t)  -  a'(t)  +■  l  I3(a  At)) 
j-7+1  *  3 

where  a^( t) ,  j  -  q  +  are  free  to  be  ehoaen.  Define  the  kernels  9j(t)  as 

before,  but  with  a^(t)  replaced  by  a^(t)  for  0  <  j  <  q. 

Theorea  3.3.  Let  the  hypotheses  of  theorem  4.2  hold.  Then  a(t)  is  the  best  rth  order 
correction  to  a'(t)  if  and  only  if 

g^(t,Sj,»»«Sj>  »  B{f(t)h{s1)»**h(Sj)},  q  +  1  <  j  <  r  .  0.24) 

Proof.  As  before,  it  suffices  to  show  that  (3.24)  holds  iff 

Elc(t)-a<t>Ha<t)-?Ct))  -  0 

r  , 

for  all  c<t)  -  a’(t)  £  rlc.(t)).  By  the  same  calculations  as  in  theorea  4.2 

j-q+1  t  3 

E[c(t)-a(t)l  (a(t)-f(t)] 


-  B0{(c(t)-a(t)][a(t)E0{Lt|F^}  -  E0{f(t)LtIF^}]} 


-y  l  It<cr*i)(  E  *Ji91(t)-*1(t)i  ♦  i 

j-q+1  3  5  j-o  3  3  j-r+1  3  3 

+  l  I^(q.(t))+a(t)En(L'2r>Fl[]  -  E {f<t>L*2r)  |F*}1} 
J-2r+1  *  3  Otto 

r  t  *1-1 

“  l  /()*'•/<)  <t',1',**',l)d"j**'d,1 


This  equals  aero  iff  g^  -  for  q  ♦  1  <  j  <r. 

Remark  Clearly,  an  analogous  result  holds  for  the  case  in  which  an  arbitrary  subset  of 

(a^ is  qiven  and  the  remainder  are  chosen  as  to  optimise  the  mean-square  filter 

error.  Thus,  if  a,,  j  e  }  c  {0,1, •••,r}  are  qiven,  then  the  {a.(t)>, 

j  i  q  —  j 

J  t  {}.*•••*!  }  are  optimally  chosen  iff  gM  -  l.  for  every  J  e  {0, 1,*««,r)  - 
1  q  3  J 

As  a  first  example  of  theorem  3.2  let  us  compute  the  kernel  equations  for  the  best 
linear  estimate  f(t)  -  aQ(t)  +  /^a^t.sjdyl s) .  Prom  (3.13), 
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g„(t)  -  «0(t)  +  /g«1(t,o)E(h(o)do 


g^ (t,s)  -  +  ^•1(t,o)Efh(«)h(c)]do  ♦  aQ(t)Bh<s)  . 

The  kernel  equations  are  then 

a0(t>  +  /jja^t.oJEMoldo  -  Ef(t) 


a0(t)*h{s)  -  aj(t.s)  +  |Jat(t,a)((h(i)h(  o))do  -  Ef(t)h(s)  , 
or,  eliminating  ag(t)  from  the  second  aquation, 

-0<t)  +  /Ja1  ( t ,  o)E  | h(  cr)  ]  do  *  Ef  ( t) 

(t,s)  +  /^a^  (t ,  o)cov(h(  s)  ,h(  a)  ]  do  “  cov[f(  t)  ,h(s)  ] 


(3.25)  Is,  of  course,  the  well-known  Wiener-Hopf  type  equation  for  optimal  linear 
filtering.  Before  examining  higher  order  examples,  we  will  discuss  the  Kalman  filter. 


3.3  The  Kalman  filter 

Consider  the  filtering  problem  in  which  h(t,x)  -  H(t)x  and  x(t)  is  a  Gauss-Markov 
process  arising  as  the  solution  of  the  system 

dx(t)  -  «t)x(t)dt  +  G(t) db( t) 

where  Xg  “  constant  or  a  Gaussian  auv.  independent  of  the  Brownian  motion  b( ♦).  The 
celebrated  Kalman-Bucy  theorem  states  that  the  optimal  state  estimator  S(t)  - 
E{x(t)|F*}  satisfies  the  equation 


dx(t)  -  P(t)x(t)dt  ♦  P(t)HT(t)(dy(t)  -  H( t)x( t)dt] 
'x(O)  -  Xg 


(3.26) 
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*-«*rji*  :  *.*ttvr£U: 


where  P(t)  is  the  solution  of  a  deterministic  Riccati  equations*  It  follows  that  x(t) 
is,  in  fact,  a  linear  functional  of  y( •)/  if  4(t,s)  denotes  the  state  transition  matrix 
of  F(t)  -  P(t)HT(t)H(t),  the  solution  to  (3.26)  is 


x(t)  -  *<t,0)xQ  +  t(t ,s)P(s)HT( s )dy( s )  . 


(3.27) 


This  simple,  linear  structure  is  not  an  immediate  consequence  of  the  expansion  formulae  of 
theorem  3.1,  because,  even  in  this  case,  both  numerator  and  denominator  series  will  be 
truly  infinite  sums.  It  is  therefore  of  interest  to  see  how  x(t)  can  be  derived  from  the 
general  expansion.  He  will  show  that  this  can  be  done  using  theorem  3.2  and  moment 
equalities  for  Gaussian  random  variables. 

The  most  common  proof  of  the  Kalman-Bucy  filter  invokes  the  stochastic  differential 
equation  for  the  conditional  moments  (cf.  Fujisaki,  Kallianpur,  Kunita  [2]).  In  this 

a2  'S 

approach,  the  equation  for  x(t)  requires  knowledge  of  x  (t) ,  that  for  x  (t)  knowledge 
of  x  (t),  and  so  on,  thus  leading  to  an  infinite,  coupled  set  of  equations.  To  derive 
the  Kalman-Bucy  theorem,  it  must  be  independently  argued  that  the  conditional  distribution 
of  x(t)  given  is  Gaussian.  Because  of  identities  between  different  moments  of 

Gaussian  m.v.'s,  this  allows  the  moment  equations  to  be  truncated  at  n  ■*  2  and  leads  to 
(3.26)  and  (3.27).  By  way  of  contrast,  the  derivation  here  will  not  require  explicitly 
knowing  the  conditional  density.  For  other  methods  of  deriving  the  Kalman-Bucy  filter,  see 
Van  Schuppen  (20). 

In  the  interest  of  computational  simplicity,  we  will  consider  only  the  most  simple 


dx( t)  -  db( t)  x(0 )  -  0 


dy(t)  -  x(t)dt  +  dw(t)  y(0)  -  0  , 


(3.28) 


where  b(  •)  and  w( *)  are  independent,  standard  Brownian  motions.  The  techniques  work 
also  for  the  general  case. 


Theorem  3.4  x(t)  ”  j!j  a(t,s)dy(s)  where  a(t,s)  aatisfies  the  Wiener-Hopf  equation 


a(t,a)  +  a( t ,  o)  min(s,o)da 


a  t  >  a  . 


Before  presenting  the  proof  we  must  recall  the  following  moment  identities  (Miller  [16], 
Marcus-Willsky  [14]). 

Lemma  3 . 5 v  Let  [z.,...,Zy]  be  a  jointly  Gaussian  random  vector.  Then 


E(z  ]  -  b  Ei  •••(  +  l  eovfs  ,*  JEf  K  z  J 

j-2  3  1 


Proof  of  theorem  3.4.  Since  y(  • )  is  continuous  and  Gaussian,  the  set  of  polynomials  in 
2  v 

y(  •)  is  dense  in  L  (8,F*,P),  (Kallianpur  [8]).  Therefore,  it  suffices  to  show  that 
a(t,s)dy(s)  is  the  best  rth  order  estimate  for  every  r,  1  <  r  <  ».  Since 


E (ft  b2(s)ds)r  < 


E  b2< t» < /q  b2(s)ds)r  < 


for  all  r  and  t,  theorem  4.2  applies.  That  is,  if  (t, . . . ) ,  0  <  j  <  »  are  defined 


so  that 


••  am 

a(t,s)dy(s)  l  I*(k  )  “  l  I*(g.) 

i-0  *  1  i-0  *  1 

a(t,s)dy(s)  is  the  beat  rth  order  estimate  if  and  only  if 

(t,s^ ,  •  ‘ *,8j )  “  *{b(t)b(s1 ) •••b(Sj ) }  0  <  j  <  r 

From  (3.14),  we  may  easily  calculate 


g0<t)  -  o 


g^(t,»*»)  ■  j(a(t,*)  ®(t)kj_1>( ••••) 

+  (a(t,  •)  ©1(t)k^+1>( •••)  j  >  0 


(3.29) 


'»Vv 


t 


However 


j< a(t,  •) 


*  jT  L  *,t',.(i))lt1n,b(,.(.)>) 


l  a(t ,s,  )E  {  n  b(s  )} 
i-1  Vi 


(aft,*)  e^ltlk  Ha^,  •  •*,s^)  »  a(t,o)E{b(o)b(sf)...b(s^)}do  .  (3.30) 


The  kernel  equations  (3.29)  become 


0  -  Eb(t> 


(3.31) 


aft, a)  +  /!j  a(t,o)E{b(o)b(s)  )<Jo  -  fc{b(t)b(s)} 


(3.32) 


1  a(t,s.  )E{  n  b(s  ) }  +  {*  a(t,o)E{b(o>  It  b(s,))da 


-  E{b(t)b(s1>  •••bfa  )>  ,  J  >  2 


(3.33)., 


(3.31)  is  true  by  definition,  and  (3.32),  by  hypothesis.  It  remains  to  prove  that 
(3.33) y  j  >  2  all  hold.  However,  a  direct  application  of  lemma  3.5  shows  that 


E{b(o)  if  bfs  )}  »  \  mint  c,o.  )E{  It  b(s,)} 

a _ e  *  *  a  ^  * 


for  every  j.  Using  this,  the  left  hand  side  of  (3.33)j  becomes 


35' 


£  {a(t,s.  )  +  J*  a(  t,  o)min(  o,s,  )do}  *{  It  b(s  )} 

1-4  *  U  1  f4>4  * 


■  ^  mints  ,t)*{  It  b(s  )} 

i-1  1  i<i 

Vi 


E{b(t)b(«1)***b(«^)> 


where  the  first  equality  employs  the  hypothesis  on  a(t,s),  and  the  second  employs  lemma 
3.5  again.  Thus  (3.33).  is  true  for  all  j  >  2. 


3.4  Quadratic  Filters 

As  a  further  example  of  the  technique  of  section  3.2,  we  will  present  the  optimal 
kernel  equations  for  the  quadratic  case  (r  «  2)  and  sketch  a  theoretical  approach  to 
their  solution.  To  guarantee  validity  of  the  discussion,  assume  throughout  the  hypotheses 
of  theorem  3.2  for  r  ■  2. 

Deriving  the  optimal  kernel  equations  is  simply  a  matter  of  calculation.  Let 
f(t)  -  aQ(t)  +  jjj  •1(t,s)dy(s)  +  jjj  Jg1  a2(t,*1  ,Sj)dy( s2>dy( •1 )  and  let  g^(t,««0  be 
defined  from  aQ,  a 1 ,  a2  in  the  manner  indicated  at  <3.13).  Thus 

g„(t)  -  a<,{t)  +  a,(t)  k,  +  a2(t,»)  02  k2  (3.34) 


(3.35) 


(3.36) 


g^( t,s)  «  a^t.s)  +  Sgjtlk^s)  +  (a^t,*)  k2>(s) 

+  <a2<t,*)  a1  kjXs)  +  <a2<t,0  @2  k3)(s) 
g2<t,sl(s2)  -  a2<t,s1,s2)  a0<t)k2<s1,s2)  ♦  (a^t.O  9  k^ta^Sj) 

♦  <a1<t,«)  9^  kjXs^Sj)  +  2(a2<t,«)  ®1  ’c2><*r,2> 

+  <a2<t,*)  »2  k4)(st,s2)  . 


(More  properly,  9  in  (3.34)  -  (3.36)  should  be  9(t).>  According  to  theorem  3.2  f(t) 
is  optimal  quadratic  iff  gQ,  g^,  and  g2,  are  respectively,  Ef(t),  Ef(t)h(s)  and 
Vf(t)h{Sj  )h(s2).  Meaning  the  definition  of  9(t)  from  section  2  and  kj  » 

Eh(Sj) . • .h( Sj) ,  we  derive  for  the  optimal  kernel  equations > 


1 


t  *  *1 

Ef(t)  -  «0<t)  +  Jg  a)  (t,a)Eh(s)d»  +  J|j  JQ  a2(t,»1  .s2)Eh(«1  )ht«2)d*2d81 
Ef(t)h(s)  »  a^t.s)  +  aQ(t)Eh(8)  +  af  ( t ,  o)Eh(  o)h(  s)do 


(3.37) 


*  *  o, 

*  J0  «2<t,8,o)Eh(  o)do  +  jg  /Q  a2(t,(j1,o2)Eh(a1)h(o2)h(e)d8 
Ef(t)h(a1)h(s2)  -  a2(t,#1,s2)  +  aQ (OEMs, )h(s2 )  +  a1 (t,a1 )Eh(s2 ) 


(3.38) 


+  a1(t,a2)Eh(a1)  +  J^j  a^(t,  o)Eh (  o)h(s1  )h(s2)do 

♦  /g  [a2(t,s1  ,o)Eh(o)h(82)  +  a2(t,s2,o)Eh(  o)h(at)]do  (3.39) 

a- 

+  Jg  fQ  a2(t,o1 ,  o2)E{h(a1  )h(s2)h(  o1  )h(  o2>  JdOjd^  . 

These  equations  deserve  some  elementary  remarks  before  we  set  about  solving  them. 
First,  the  optimal  kernels  are  all  interrelated  in  the  general  case.  We  cannot  solve  for 
ag  and  a^  independently  of  knowing  a2.  Likewise,  if  aQ  •  cQ>  -  c1  are  the  kernels 

of  the  beat  linear  estimate,  they  will  not,  in  general,  be  the  lower  order  kernels  of  the 
best  quadratic  estimate.  Secondly,  the  aquation  (3.37)  -  (3.39)  can  be  used  for  other 
suboptimal  designs  in  the  spirit  of  theorem  3.3.  Thus,  if  aQ  and  a  ,  are  given,  and  we 
seek  the  best  quadratic  correction  to  aQ(t)  +  a^ (t,s)dy( s) ,  this  will  he  found  by 

solving  (3.39)  for  a2  in  terms  of  a^  and  aQ.  The  methods  developed  for  solving  the 
full  set  of  equations  will  also  apply  to  the  best  correction  problem. 

As  a  system  of  integral  equations,  (3.37)  -  (3.39)  looks  complicated  and  contains 
unusual  features.  Nevertheless,  we  will  show  that  solving  the  system  can  be  reduced  to 
two,  familiar  tasks  —  solving  a  linear  estimation  problem  and  solving  a  Fredholm  Integral 
equation.  The  method  behind  this  reduction  is  simply  to  eliminate  aQ  and  a^  to  obtain 
an  equation  for  a2.  The  basic  steps  are:  1)  eliminate  ag(t)  from  (3.38)  to  derive  the 
Integral  equation  (3.41)  for  a^>  2)  solve  this  for  a1  in  terms  of  a2  using  the 
solution  to  the  linear  filtering  problem,  (see  3.42)i  3)  use  (3.42)  to  eliminate  a1  from 
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(3.39)  and  derive  (3.43),  an  Integral  equation  only  involving  the  unknown  a2,  and*  4) 
turn  (3.43)  into  the  Fredholm  equation  (3. 45).  The  central  equation  is  thus  (3.45).  Once 
it  is  solved  for  «2,  a1  and  aQ  are  found  by  using  (3.42)  and  (3.3?)  respectively. 

Let  R  s  L^( [0,tl )  ♦  L^(fO,tl)  be  the  operator  defined  by 

(RB)(8)  -  /£  covth(s)  ,h(  <j)J  8(c)do  .  (3.40) 


The  first  step  is  easy;  simply  solve  (3.37)  for  aQ(t)  and  substitute  the  result  in 
(3.38).  We  then  derive 


[I  +  R] a, ( t ,  •)  ( s)  -  cov(f(t),h(s)J  -  8h(8)a2(t,s,o)dc 

t 

-  J0  /Q  eovlh(s),h(o1)h(o2)la2(t,01,o2)do2do1  . 

The  next  step,  solving  this  for  a^,  thus  requires  inverting  I  +  R. 

Lemma  3.6 

i)  h(s),  a  <  t,  has  a  best  linear  estimate  h(s>  *  ( s)  +  /“  a(a.a,*v(s) 

(As  a  convention,  set  a(s , a)  “  0  for  0  <  s  <  a  <  t) 
it)  where  Q  is  the  integral  operator  with  kernel 

q(s1,s2)  -  crts^a.,)  +  crises,)  -  /jj  aJo.s^cKo.s^da 


(3.41) 


0  <  s1,  s2  <  t  . 

ft  2  4  — 

Proof,  we  are  assuming  E(JQ  h  (s)ds]  <  «.  This  guarantees  that  h(s)  exists,  and,  as 
in  (3.25), 

s, 

a(sl(s2)  +  /  u(s1,c)cov(h(s2),h(c)]d0  =  cov(h(s1 )  ,h(  a)] 

0  <  s2  <  s1  <  t 


ii)  is  standard.  See,  for  instance,  Gessey  [3]. 

This  lemma  can  now  be  spoiled  to  solve  (3.41)  for  ajlt.s)) 

a^t/s)  «  cov(f[t),h(8>]  -  q(s,o)cov[f(t),h(o)]do 
-  /Jj  /g  r,(t,s,o1,c2)a2(t,81,o2)dc2do1 


(3.42) 


where 


r'  (t,»,o1 » Oj)  -  j  oovth(»),h(o1)h(a2)] 

+  j  {q(a,  «2>Eh(af  )  «•  q(a,  ^  )Eh(  <“.,>) 

+  j  £  «(*.*>w*n»(a),h<«  >h<«  }ld0  , 

In  derivlnq  r* ,  advantage  was  taken  of  the  (assumed)  ayanetry  of  a2(t,s.|,a2)  in  s,, 
a 2 •  Nov>  using  (3.37)  and  (3.42),  we  may  eliminate  Sq  and  a^  from  equation  (3.39). 
The  result  is 

a2(t,s1,s2)  -  r(t,s1,s2) 

-  /flirts,  .o)*2<t,e2,o)  ♦  r1(s2,o)a2(t,e1,o)1do  (J>4 

'  /o  fi  r2(t'  *1  '*2 '  °t '  °2  )a2(t  *  01'  °2  )d02dol 


where 


F(t/s1,s2)  -  cov(f(t),h(s1),h(s2)l 

-  J*  cov{h(s1  ),h(s2)h(  o)l  (eov[f (t),h(o)l 

-  /g  q(o1<^)cov[f(t),h(o2)]do2do 


r1(s.o)  “  cov(h(s),h(<j)] 


r2(t'8i  '*2'  °1 '  °2  >  ’  \  JcovIh<s1>'h<B2)'hto1)h(°2)1 

-  oov[h(s1  ),h(s2  )lcov(h(  ),h(c2)]  J 

-  oov(h(s1),h(s2),h(n)lr’(t,n»o1,o2)dn  . 

It  remains  to  solve  (3.43)  for  a2>  This  is  simply  a  linear  integral  equation  for 
a2.  However,  its  middle  term,  involving  a  tensor  contraction  between  a2  and  r ^ ,  is  not 
standard,  and  the  usual  linear  Integration  theory  does  not  apply  directly.  Despite  this. 
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it  is  possible  to  rewrite  (3.43)  as  a  Fredholm  integral  equation  and  thereby  to  reduce  the 
task  of  calculating  a2  to  a  familiar  problem.  First  notice  that  (3.43)  may  be  rewritten 
in  the  form 


or 


a2(Sl'S2)  “  F(S 


1 


/®2 )  -  <Ra2<a2' 

•  £  £  r2(si 


• ) ) ( s j )  -  (Ra2(s1,«))(s2) 
,  s2 ,  o1 ,  o2  )a2 ( o1 ,  Oj  )dc2dc1 


1(1  +  R)a2<a2> •)] ( s ^ )  -  F(slfs2>  -  (Ra2(s1# «))(s2) 


■  £  £  r2(vwwwdVffi 


(3.44) 


In  these  equations,  the  argument  t  has  been  omitted  for  simplicity.  How  apply  (I  +  R)-1 
to  both  sides  of  (3.44).  Again,  an  equation  of  the  form 

1(1  +  R]a  (st,*)l  (s2)  ■  linear  terms  in  a2 

is  obtained,  but  this  time  there  are  no  partial  tensor  contractions  of  the  form  Ra(Sj,«) 
(Sj)  on  the  right  hand  side.  With  a  final  application  (I  +  R)-1  “  I  -  Q  to  both  sides 
the  following  Fredholm  equation  for  a2  is  derived. 


jlt.s^Sj)  -  ri(t,s1,s2>  *  ft  1%  y(t,r  ,a2,o1,o2)a2(t,o1,o2>do2<lo1  (3.45) 


where 


|(t,s1,s2) 


F(t,s1  ,s2)  - 
+ 


£lq(«2»«1  )F(t,s,,C2)+q(s1,o2)F(t,c1,s2)1dc2do1 

£  £  vm' v°2)da2d<,i 


y(t,B^,B2>a1,a2>  ”  r1(t,s1,s2,  ^ , «2)  -  /*  q(s,u)Y1(u,s2,o1,o2)du 
I1(t,s1,s2,c1  ,c2>  •  -  r2(t,s1, *2’ai'a2)  “  ^*2 '  ai  )3(s ^ >  ®2> 

+  £  ■?(»2,u)r2(t,s1  .u.o^Cjjdu  . 
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Renarks.  The  viewpoint  here  is  not  recursive.  Rather  t  is  fixed  throughout  and  integral 

operators  are  defined  and  Inverted  on  L2({0,t])  or  L2([0,tj2l,  and  at  a  later  tine  t 

the  whole  operation  would  have  to  be  repeated.  This  poses  an  interesting  question  for 

further  research.  What  structure  on  the  moments  Eh( s) ,  Ef(t)h(s),  etc.,  would  allow  a 

recursive  solution  to  the  quadratic  kernel  equations,  in  the  sense  that  a< t  *  dt,  s,,s2) 

could  be  constructed  in  a  simple  way  from  a(t,Sj,e2)?  A  related  question  is  also 

important.  When  are  the  solutions  a^  and  a2  separable  functions?  If  separability 

1  2 

occured,  then,  as  mentioned  above,  the  stochastic  integrals  l^a^)  and  *2^*2*  coul<J  be 
realized  as  the  outputs  of  stochastic  differential  systems.  Certainly,  if  F  and  y  of 
the  Fredholm  equation  for  a2  are  separable,  a2  will  be  separable,  but  due  to  the 
complicated  manner  in  which  the  moments  Ef(t),  Ef(t)h(s>,  etc.,  combine  to  produce  F 
and  y,  this  does  not  lead  to  easy  conditions.  This  issue  is  not  pursued  further. 
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